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I review recent progress in the determination of the parton structure of the nucleon, in 
particular from deep-inelastic structure functions. I explain how the needs of current and 
future precision phenomenology, specifically at the LHC, have turned the determination 
of parton distributions into a quantitative problem. I describe the results and difficulties 
of current approaches and ideas to go beyond them. 

1. From HERA to the LHC 

Knowledge of the parton structure of the nucleon has undergone a revolution during the 
last decade, driven by present and future experimental data. On the one hand, current 
experiments, especially at HERA pQ but also from neutrino beams at Fermilab, have 
provided us with an unprecedented amount of experimental information, mostly from 
the measurement of deep-inelastic structure functions. On the other hand, LHC, now 
behind the corner, will require, essentially for the first time, a precision approach to the 
structure of the nucleon in the context of searches for new physics |2J. This has stimulated 
a considerable amount of theoretical and phenomenological work, with the aim of turning 
the physics of parton distributions into a quantitative science. 

2. Determining PDFS 

The parton structure of the nucleon can be determined thanks to factorization: a 
physical cross section is expressed as the convolution of perturbatively computable parton 
cross sections, times parton densities (pdfs). We can then use one process to measure pdfs, 
which are then used to compute a different process. In the prototypical case of inclusive 
deep-inelastic scattering (DIS) the cross section, up to corrections suppressed by powers 
of rrip/Q 2 , is given by 



where A are the lepton and proton helicities (assuming longitudinal proton polarization), 
the kinematic variables are y = ^| (lepton fractional energy loss), x = (Bjorken x), 
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Figure 1. Kinematic coverage of current data (left, from Ref. |T|) and the LHC (right, 
from Ref. 0). 



and r\ depends on the gauge bosons which mediates the scattering process: 
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More contributions are due to interference of different exchange processes. 

The factorization theorem expresses the structure functions which parametrize the cross 
section as a convolution of a perturbative partonic cross section (coefficient function) and 
a pdf. For example 



x 



E Wi(qi(x,Q 2 ) +qi(x,Q 2 ) 

lav. % \ 

+a s (Q 2 ) [Ci[x,a s (Q 2 )} <g> (q^x, Q 2 ) + q^x, Q 2 )) 
+C g [x,a s {Q 2 )}®g] \, 



(3) 



where /(/) <g> g(x) = £ ^f{x/y)g{y) and C x = 1 + 0(a a ), C 9 = 0(a s ) are respectively 
the z-th quark flavour and gluon coefficient functions, i.e. the perturbative cross-sections 
for the gauge boson-parton scattering process. The structure function F\ depends on the 
same combination of quarks as F 2 , but with a different gluon content: 

F2(x, Q 2 ) = Fj(x, Q 2 ) - 2xF?(x, Q 2 ) 

E «s(g 2 )[C'/'[x,a s (g 2 )]®(g 4 (x,Q 2 )+g,(x,g 2 ))+C 9 L [x,« s (Q 2 )]®^. (4) 
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Other structure functions are sensitive to different combinations of parton flavours: the 
gi structure functions are spin-odd and contribute to the polarized cross section; the 
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structure functions F 3 , g 4 and #5 are parity-violating, and contribute to weak current 
scattering. For unpolarized 7* DIS only Fx and F 2 contribute. 
Q 2 =9 GeV 2 
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Figure 2. Left: impact of Drell-Yan data: top (dots) DIS only; middle (dashes) DIS+E866 
Drell-Yan data; bottom (solid) DIS+ E866 and projected LHC Drell-Yan data (from 
Ref. [8,) Right: gluon anomalous dimensions in iV-space. 

Various physical processes are given by different combinations of the same pdfs con- 
voluted with the appropriate partonic cross sections and kinematics. In particular, for a 
process at a hadron collider (such as Higgs production at the LHC) where the hard scale 

is the mass M 2 of the final state Q 2 = M 2 and Xi = exp ±y, where s is the center- 
of-mass energy of the hadronic collision, ±y the parton rapidities and i refers to each of 
the incoming hadrons. The kinematic regions for HERA and LHC are compared in fig. 1, 
along with the current experimental coverage of the (x, Q 2 ) plane from unpolarized DIS 
data. In order to obtain predictions for LHC processes we must solve three problems: 
1) disentangle the contribution of individual partons to the observable used to determine 
them; 2) evolve them up to the relevant scale and convolute them with the appropriate 
perturbative cross section; 3) determine the error on them. 

2.1. Phenomenology: disentangling PDFS 

Whereas deep-inelastic scattering mediated by 7* exchange provides the bulk of the 
data shown in fig. 1 (most of the HERA data and older fixed target data), they only 
measure the cross-section eq. (JTJ), i.e., in turn, the combination of pdfs of eqs. (jHEJ). This 
means that: 1) only the C-even combination q + q is accessible; 2) flavour separation can 
be done only for u and d quark using proton and deuteron targets (and then only in fixed 
target experiment, since a HERA upgrade with nuclear beams has not been approved); 
3) the gluon contribution can only be determined through scaling violations: 

f/mQ 2 ) = [ %q (N)F° + 2n flqg (N)g(N,Q 2 )}+0(a 2 ), (5) 
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where N is related to x by Mellin transform according to ^(iV, Q 2 ) = /(Jdn"" 1 ^^, Q 2 ) 
(and analogously for parton distributions). 

Separation of light flavours and antiflavours can be obtained by including different 
observables along with the DIS structure functions |3|4| in the set of processes used to 
determine the pdfs. Specifically, light antiflavour separation is obtained comparing (Teva- 
tron) Drell-Yan production with proton or deuteron targets, because 



pd 
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2 V u{x 2 )) 



(6) 



and light flavour separation from (Tevatron) W production asymmetry data, which, 
neglecting strange quark contributions, give 



a w - ^ d(xi)u(x 2 ) 
a w + u(xi)d(x 2 ) 



(7) 



Inclusion of these data is mandatory in order to reduce the uncertainty on quark and 
antiquark distributions significantly below 10% (see fig. 2), as required for LHC phe- 
nomenology. 

Strangeness can only be determined through weak current DIS, i.e., essentially neutrino 
data (and some HERA data), which are however scarce and subject to sizable uncertain- 
ties. As a consequence, in current fits |3J4p7j the shape of the strange distribution is not 
determined, and assumed to be related in a fixed way to that of the light quark sea; the 
s and s distributions have been determined only in dedicated analyses based on neutrino 
data 0. 
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Figure 3. Impact of the NNLO corrections on (left) the coefficient functions and (right) 
the evolution of parton distributions, (from Ref. |B|) 

Finally, because (see fig. 2) ^ qg 7 gg at large N, the gluon can only be determined 
accurately from eq. (0) at small N, i.e., inverting the Mellin transform, at small x. In 
some parton sets |3|4j accuracy on the gluon at large x is improved through the inclusion 
of data from inclusive jet production at the Tevatron. 
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A better handle on the gluon could be obtained if the contributions of F 2 and F\ to 
the cross section eq. (JTJ) could be disentangled. This is possible by varying y at fixed x 
and Q 2 , which requires varying the beam energy: it would be possible by lowering the 
beam energy at HERAII. On top of this, high luminosity and especially weak scattering 
data from HERAII could improve somewhat current flavour separation [U]. It is unclear 
whether any of these measurements might be performed before the shtudown of HERA. 

A markedly more significant improvement in knowledge of parton distributions is ex- 
pected from the coming into operation of the LHC itself, even though only extremely 
limited studies of the impact of LHC on PDFs are available so far ^U] • At longer term, 
full information on the flavour decomposition of the nucleon could come from a neutrino 
factory [11 J - 

2.2. Theory: perturbative coefficients and evolution 

However abundant the data, pdfs can be extracted from them only through the use of 
perturbative parton cross sections which are needed to relate them to physical obervables 
and the anomalous dimensions which are required to evolve them to a common scale. 
Hence, the theoretical uncertainty on pdfs is always at least as large as the size of the 
unknown perturbative corrections. Until very recently, only inclusive Drell-Yan and DIS 
partonic cross sections were known at NNLO ^2]- I n the last couple of years, thanks 
to the development of new computational techniques, NNLO results have been obtained 
for inclusive Higgs production 13 and, more importantly, for a number of less inclusive 
observables, specifically Drell-Yan and W rapidity distributions in hadronic collisions [14J. 
On the other hand, the full set of NNLO anomalous dimensions or splitting functions has 
also been determined ^3] after an effort of more than a decade. 

The impact of NNLO corrections to DIS coefficients and evolution is displayed in fig. El 
Clearly, their inclusion is required if one aims at achieving a determination of parton 
distributions to percent accuracy. Their impact on specific observables (such as F£) or 
in particular kinematic regions (such as very small or very large x) can be even more 
dramatic. The large size of NNLO corrections at small and large x signals the need for 
all-order resummation of the perturbative expansion in these regions. Indeed, higher 
order perturbative correcyions are known to be enhanced by logs of x and (1 —x) through 
terms of the form a^lnl and aln 2 (l — x). When x is small or large enough the log 
enhancement can offset the suppression due to the strong coupling a s . 

The impact of large x corrections beyond NNLLO on the extraction of pdfs from DIS 
has been estimated ^H] to be negligible at least if one imposes a cut on the center-of- 
mass energy of the "f*p process W 2 = - ~ 10 GeV 2 . On the other hand, once less 
inclusive observables (e.g. the differential Higgs production cross section) are considered 
and experimental cuts are taken into account the impact of large x corrections can become 
sizable [T7|. Because large x resummations are known exactly and their inclusion poses 
no problem of principle it would be advisable to include them in future parton sets. 

The impact of small x corrections is dramatically highlighted by the recent NNLO 
splitting function determination. Indeed (fig.HJ) the perturbative expansion of the splitting 
function is unstable at small x, but on the other hand the logarithmically enhanced small 
x terms (leading singularities) are not a good approximation to the full result. This 
means that small x contributions have to be resummed, but this resummation must also 
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Figure 4. Left: the splitting function (P gg with nf = 0) at LO (solid, black), NLO 
(dot dashed, green), NNLO (dashed, red), NNLO leading singularities (lower dotted, red), 
NNNLO leading singularities (upper dotted, blue). The two solid (blue) curves with a 
dip are the small x resummations of Ref. ^H] and Ref. [T^]. Right: best-fit gluon at NLO 
(bottom, solid), NNLO (middle, dashed), and NNLO evolution of the NLO best-fit (top, 
dotted, from Ref. 0). 

be combined with the available fixed-order results in order to lead to a stable answer. The 
required formalism has been developed over the last decade by various groups and is now 
converging to an answer, but the relevant phenomenology has not been developed yet. 
Current results |18fl9j show (see fig. 0J) that the resummation stabilizes the NLO results, 
so that at small x the fully resummed result is actually closer to the low order (LO and 
NLO) results. The impact of NNLO corrections on the extracted gluon can be larger than 
100% at small x and small scale (see fig. EJ). Hence, the resummation is mandatory if one 
wishes to work to NNLO accuracy. 

3. Partons and errors 

Precision phenomenology needs not only a knowledge of parton distributions, but also of 
the error with which they are determined. The problem here is that parton distributions 
are functions, hence the error on them is really a probability measure in an infinite- 
dimensional space. Therefore, it cannot be determined from a finite set of data without 
extra theoretical assumptions. These assumptions in current parton fits take the form of 
a functional form: pdfs are assumed to have, at a reference scale, a given functional form, 
parametrized by a finite set of parameters. They are then evolved to the scale of the data 
and used to compute physical cross-sections which are then fitted to the data, thereby 
determining the parameters. If the full information of the covariance matrix of the data 
is retained and propagated to the parameters it is then pssible to determine errors on 
pdfs. Within the last few years, three parton sets with errors have been obtained in this 
way HUE]. 





Figure 5. Left: minimum x 2 for individual experiments as the total x 2 is allowed to 
increase (from Ref. [20] )• Right: contribution to the x 2 of the global MRST fit from 
individual experiments (from Ref. 

3.1. The problems 

The result of current global parton fits look nominally quite good, with uncertainties 
of order of a few percent on quark and antiquark distributions and at most 10% on the 
gluon (see fig. 2). However, closer inspection reveals a number of problems. Indeed, 
consider the variation of the x 2 °f the fit to each dataset as the x 2 °f the global fit is 
allowed to vary [20 . This study reveals that the fit to the CCFR neutrino DIS data or 
the BCDMS muon-deuterium DIS data can be improved very considerably at the expense 
of deteriorating the global fit (fig. EJ). This indicates that the global fit with the given 
functional form is far from the best fit to these data. Also, one may study the contribution 
of individual datasets to the x 2 °f the global fit as one of the fit parameters is varied. 
This (fig. EJ) shows that the minimum of the global fit does not correspond to minima of 
individual experiments. 

While to some extent these could just be statistical fluctuations, they may signal more 
serious problems. On the experimental side, they may signal that some datasets are not 
fully consistent with the others. On the theoretical side, they may signal that assumptions 
on the functional form are not flexible enough to accommodate the data. 

Whatever the precise cause, it appears that current determinations of errors on pdfs 
have not yet settled to a satisfactory agreement. This is dramatically apparent if one looks 
at the prediction for even the simplest inclusive LHC observable, such as the total Higgs 
production cross section (fig. [BJ: the results obtained using recent parton sets disagree 
within the respective error bands, expecially when the quark flavour separation comes 
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Figure 6. Higgs production cross section at the LHC computed using various parton sets 
(from Ref. [2]]) 



into play, such as in HW production. 

3.2. Conservative solutions 

The problems of data incompatibility and possible parametrization bias have been 
tackled in various ways in current parton sets. A first option is to replace the standard 
one-sigma contours with parameter ranges obtained studying the compatibility of the fit 
with various experiments. In ref. [3] this has been done by studying the spread of 90% 
confidence intervals for various experiments, as one moves away from the minimum of 
the x 2 along eigenvectors of the hessian matrix, and taking the envelope of the resulting 
ranges. In practice, this suggests that A% 2 = 100 for the global fit leads to a reasonable 
estimate of the one-sigma contours for pdfs (see fig. |7J). In ref. |3] A% 2 = 50 is adopted 
instead, and seen to lead to results which are not so different for the pdf error bands. 

However, the need to chose ad-hoc a large value of A% 2 is somewhat disturbing. An 
alternative suggestion has been made in Ref. [3], where it is observed that most of the 
trouble seems to come from specific kinematic regions where theoretical uncertainties 
become large: the low Q 2 region where the perturbative expansion converes slowly, and 
the large and small x regions where resummation is necessary. It is then shown that 
by imposing more restrictive cuts in Q 2 , x and W 2 (see sect. 2.2 above) a much more 
palatable value A% 2 = 5 can be taken to determine the error on pdfs. The 'conservative' 
partons obtained in this way can differ by more than 10% from standard ones (see fig. |7|). 
The problem is that clearly there is information loss in the procss, and predictions become 
unreliable when regions are probed which have been excluded from the fit due to the cuts 
(such as the very small x region). 

Still, one would like to be able to rely on purely statistical arguments to construct one 
sigma contours. To this purpose, in ref. [7j it has been observed that many problems 
seem to come from the need of combining different data sets. Indeed, in ref. [7] it has ben 
demonstrated that if only DIS data are included in the global fit, and the full covariance 
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Figure 7. Left: 90% probability ranges for each experiment as a function of the distance 
from the minimum along a typical eigenvector of the covariance matrix. The blue band 
corresponds to A% 2 = 100 (from Ref. 0]). Right: ratio of MRST 'conservative' partons 
to the reference MRST set (from Ref. [3] ). 

matrices of experiments are taken into accout, it is possible to achieve a statistically stable 
fit where one-sigma error bands are given by A% 2 = 1. The problem is that DIS data 
alone are insufficient to determine for instance the quark flavour decomposition. However, 
preliminary results -8J suggest that DIS data can be combined e.g. with Drell-Yan data, 
provided only that the \ 2 from different dataset are brought to a common normalization. 
Satisfactory errors on all pdfs are then obtained (see fig. 2). 

The fact that different prescriptions seem to be able to solve at leats in part the problems 
of global fitting suggests that the origin of these problems is not yet fully understood. 

3.3. New ideas 

The difficulties encountered in current parton fits suggest that perhaps the conventional 
approach is now reaching its limitations, and has led to the suggestion of alternative ap- 
proaches. A first proposal [22] is to use Bayesian inference to update a prior representation 
of the probability density which is generated as a Monte Carlo sample based e.g. on an 
available parton parametrization. The final result should be largely independent of the 
choice of prior if the data are sufficiently abundant. The main difficulties with this ap- 
proach are related to the need to keep the computational complexity under control, in 
particular in the choice of priors and in the handling of flat directions, i.e. Monte Carlo 
replicas which lead to similar values of the \ 2 ■ A preliminary set of partons ('Fermi' par- 
tons) has been contructed within this approach j22] (see fig. EJ). The results suggest that 
indeed a treatment of non-gaussian probability densities may be required if one wishes to 
combine experimental information from different sets. However, no satisfactory global fit 




Figure 8. Left: Fermi partons (from Ref. [22] )• Right: nonsinglet neural partons (prelim- 
inary, [23]). 

within this approach has been obtained yet: in particular, the preliminary results do not 
lead to a satisfactory value of the strong coupling. 

Another approach has been suggested in ref. [2H] , based on the idea of using neural net- 
works as universal unbiased interpolants. In this approach, the data are used to generate 
a Monte Carlo sample which represents the probability measure in the space of functions 
at the points at which data exist. Neural networks are then used to interpolate between 
these points: the ensuing Monte Carlo set of neural networks is then the sought-for prob- 
ability in the space of functions. This approach has been used in ref. j2Hl to parametrize 
all available F<i data, but without extracting the contribution of individual pdfs to the 
structure function. Preliminary results on a pdf extraction based on this method have 
been presented in ref. (see fig. JSJ). They suggest that fixed functional forms may be 
too rigid in estimating errors expecially at the edges of the data region. The feasibility of 
a full parton set based on this approach, which is also computationally quite intensive, is 
however still to be demonstrated. 

These approaches have in common the feature of trying to use the available experimental 
information in a way which is free of theoretical assumptions. 

4. Conclusions 

Perturbative QCD phenomenology has become the object of precision quantitative stud- 
ies during the last decade and it is now on a similar footing as precision electroweak 
phenomenology. However, unlike in the electroweak case, the impossibility to compute 
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the structure of the nucleon from first principles poses taxing problems of data analy- 
sis. A satisfactory agreement between different determination of parton distributions has 
not been reached yet especially at the level of error determination, and may require the 
development of entirely new techniques. 
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the unpublished plots shown in figs. 2-4. 
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